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Abstract 

We report on some extensive computations and experiments concerning the mo- 
ments of quadratic Dirichlet L- functions at the critical point. We computed the 
values of L(l/2, Xd) for —5 x 10^'' < d < 1.3 x 10^^ in order to numerically test con- 
jectures concerning the moments Yl\d\<x -^(l/^i Xd)'^- Specifically, we tested the full 
asymptotics for the moments conjectured by Conrey, Farmer, Keating, Rubinstein, 
and Snaith, as well as the conjectures of Diaconu, Goldfeld, Hoffstein, and Zhang 
concerning additional lower terms in the moments. We also describe the algorithms 
used for this large scale computation. 



1 Introduction 

Let D he a squarefree integer, D ^ 0, 1, and let K = Q(a/D) be the corresponding 
quadratic field. The fundamental discriminant d of K equals D if D = 1 mod 4, and AD 
if _D = 2, 3 mod 4. Let Xd{n) be the Kronecker symbol (^), and L{s,Xd) the quadratic 
Dirichlet L-function given by the Dirichlet series 

Lis,Xd) = J2^^ 3f?(s)>0, (1) 

n=l 

satisfying the functional equation 

Hs, Xd) = \d r ^ Xis, a)L(l - s, Xd) , (2) 



where 



^ s-ir(i^) Jo ifrf>0, 

X(s,a) = Tc 2 \ ^ , , a = < (3) 

^ ' ' r(^) [1 if < 0. ^ ' 
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In this paper, we describe some experiments concerning the moments of quadratic 
Dirichlet L-functions at the central critical point: 



E ^(l/2,x.)^ (4) 

deD(X) 

Here, /c is a positive integer, and D{X) denotes the set of fundamental discriminants with 
\d\ < X. 

Several conjectures exist for these moments. For instance, Keating and Snaith [KSn], 

motivated by the fundamental work of Katz and Sarnak [KS] and based on an analogous 
result in Random Matrix Theory, conjectured a formula for the leading asymptotics of (4). 
Specifically, they conjectured that, as X ^ oo, 

-L- Z i(l/2.X.)'~a.n(4log(A-)^, (6) 

' ^ ^ ' deD{X) j=i ^ 

where is an arithmetic factor, described by Conrey and Farmer [CF] , of the form 

fc(fe+i) , , , 

(-0^ (-^) -(^-^) I 



n 



1 + ; 



P 



+ - 
P 



\ J 



In a few cases, Keating and Snaith's conjecture agrees with known theorems, e.g. Jutila 
for A; = 1, 2 [J], and Soundararajan for /c = 3 [S]. 

Subsequently, a more precise asymptotic expansion for (4) was predicted by Conrey, 
Farmer, Keating, Rubinstein, and Snaith [CFKRS]. The lower terms are affected by the 
form of the Gamma factors in the functional equation for L(s, Xd)i so naturally they con- 
sidered the subset of d > separately from d < 0. Therefore, let 

D^{X) = {d e D{X) : d > 0} 

D_{X) = {deD{X):d<Q}. (6) 
The conjecture of CFKRS states that: 

E L(l/2,x.)'~ VQ±(^,logX), (7) 

, ^ 

deD±(X) 

where Q±{k.,x) is a polynomial of degree k{k + l)/2 in x. The fraction S/tt^ accounts for 
the density of fundamental discriminants amongst all the integers. In their paper, CFKRS 
also conjectured a remainder term of size 0{X^/'^^^) but the evidence, both theoretical and 
numerical (discussed below), suggests the existence of lower order terms for /c > 3. 
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The beauty in their conjecture hes in the fact that it gives a formula for the polynomial 
in question. The polynomial Q±{k, log X) is expressed in terms of a more fundamental 
polynomial Q±{k,x) of the same degree that captures the moments locally: 

Q±(A;,logX) = l^ Q±{k,logt)dt. (8) 

The polynomial Q±{k,x) is described below, and its leading coefficient agrees with the 
conjecture of Keating and Snaith (5). 

The polynomial Q±{k, x) of CFKRS is given by them as the A;- fold residue: 

(9) 

where 

k ^ 

G±{zi,...,Zk)^Ak{zi,...,Zk)YlX{- + Zj,a)-^^^ H Ci^ + Zi + Zj), (10) 

j=l l<i<j<k 

and 

A{zi,...,zi)^ n i^-^) (11) 

l<i<j<k 

is a Vandermonde determinant. Here, a = for and a = 1 for X{s,a) is given 
in (3), and equals the Euler product, absolutely convergent for \^Zj\ < 1/2, defined by 



A,{z^,...,z,) = ll n (l-^IT^Ti-) 



X 



More generally, CFKRS predicted that for suitable weight functions 

J2 L{l/2,Xdfg{\d\)^-^ Q^ik,logt)g{t)dt. (13) 

d6D±(oo) 

The method that CFKRS used to heuristically derive this formula rehes on number 
theoretic techniques, specifically the approximate functional equation, but was guided by 
analogous results in random matrix theory to help determine the form of the conjecture. 
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An alternative approach for conjecturing moments exists. In their paper [DGH], Dia- 
conu, Goldfeld, and Hoffstein used the double Dirichlet series 



Zk{S,W)= yj 

deI3(oo) ' ' 

to study the moments of L{l/2,Xd)- In particular, they showed how one can derive a 
formula for the cubic moments of L(l/2, Xd) by investigating the polar behavior of Zs{s, w). 

The method of DGH produces a proof for the cubic moment of L{l/2, Xd)- Specifically 
they show that the difference of both sides of (7) ioi k — 3 is, for any e > 0, of size 
0^{X^+'), where 9 = .85366.... They also gave a remainder term for a smoothed cubic 
moment with 6 = 4/5. Recently, Young [Y2] has obtained an improved estimate for 
the remainder term of size 0(X"^/^+^). The moments he considered were smoothed, and, 
for simplicity, he considered the subset of discriminants divisible by 8 and positive. The 
appearance of 3/4 + e is very interesting in light of the next paragraph. 

The method of DGH predicts the existence of a further lower order term of size X^^^. 
In particular, DGH conjectured that there exists a constant b such that 

J2 L{l/2,xdf = ^XQ{3,logX) + bX'i +0(^X-^+^y (14) 

deD{X) 

The existence of such a term comes from a pole of the double Dirichlet series at w = 3/4 
and s = 1/2, the conjectured meromorphic continuation of Z3(l/2,w) to ^{w) < 3/4, and 
assumes a growth condition on ^3(1/2, w). 

DGH also suggest that additional lower order terms, infinitely many for each A; > 4, are 
expected to persist. The form of these terms is described in Zhang's survey [Z], along with 
an exposition of the approach using double Dirichlet series. Their conjecture involving 
lower terms is stated in the following form. For k > 4, and every e > 0, 

00 

L(l/2,x.)'^ = $:x('+^)/(^')pKlogx) + 0(xV2+^), (15) 

deD(X) 1=1 

where every Pi is a polynomial depending on k. In particular. Pi is a polynomial of degree 
k{k + 1)/2, presumably agreeing with the polynomial predicted by the CFKRS conjecture, 
but to our knowledge this agreement has not been checked. 

Zhang [Z2] further conjectured that b ^ —.2154, and, in a private communication to 
one of the authors, reported that he also computed the constants associated with the X^^'^ 
term when one restricts to d < 0, or to d > 0, thus predicting: 

J2 L{ll2,xdf = ^XQ±(3,logX) + 6±xt+o(x^+^), (16) 

d&D±{X) 
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with 6+ —.14 and 6_ ^ —.07 (note that b — b+ + His evaluation of b, 6+, and 6_ 
involves an elaborate sieving process, and also depends on unproven hypotheses regarding 
the meromorphic continuation and rate of growth of Z^. 

While it might seem that a term as large as X^^'^ in the cubic moment should be 
easily detected, two things make it very challenging in this context: the small size of 
the constants involved, and also the fact that the remainder term, conjecturally of size 
0{X^^^~^^), dominates even in our large data set - presumably the X'^ can get as large as 
some power of log(a;), which can dominate X^^^ even for values of X as large as 10^^. 

For this reason, we embarked on a large scale computation in order to see whether 
such a lower main term in the cubic moment could be detected or not. We also carried out 
extensive verification of the predictions of CFKRS for A; = 1, . . . , 8. While CFKRS provided 
some modest data in [CFKRS], for \d\ < 10^, we carried out tests for —5 x 10^° < d < 
and < c? < 1.3 X 10^°. In order to dampen the effect of the noisy remainder term, we 
also considered smoothed moments. 

Our numerical results are described in Section 2. Interestingly, they lend support to 
both the full asymptotic expansion conjectured by CFKRS, and to the existence of lower 
terms predicted by DGH and Zhang. 

In Section 3 we describe the two methods that we used to compute a large number 
of L{l/2,Xd), for d < and, separately, for d > 0. The first, for o? < 0, is based on the 
theory of binary quadratic forms, and uses Chowla and Selberg's X-Bessel expansion of the 
Epstein zeta function [CS] . The second, for d > 0, uses a traditional smooth approximate 
functional equation. Both methods have comparable runtime complexities, but the former 
has the advantage of being faster by a constant factor. See the end of the paper for a 
discussion that compares the two runtimes. 

2 Numerical Data 

In this section, we numerically examine the conjectures of CFKRS, DGH, and Zhang for 

the moments of L{l/2, Xd)- 

The collected data provides further evidence in favour of the CFKRS conjecture con- 
cerning the full asymptotics of the moments of L{l/2^Xd)- With respect to the remainder 
term, the numerics also seem to suggest the presence of additional lower terms as predicted 
by DGH and Zhang. 

In Tables 1-2 and Figures 1-4 we depict the quantities 




deD±{X) 



(17) 
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and the related difference 

A^{k,X):^ J2 L{l/2,xd)' r Q±{k,logt)dt, (18) 

d£D±iX) ^ 

for A; = 1, . . . , 8 and both positive and negative discriminants d. 

The quantity (17) measures the consistency of CFKRS prediction, while (18) allows 
one to see the associated remainder term. The numerator of (17) was calculated by com- 
puting many values of L{l/2,Xd), using the methods described in the next two sections. 
The denominator was obtained from numerically approximated values of the coefficients 
of Q±{k,\ogt), computed in the same manner performed in [CFKRS], though to higher 
precision. Tables of the coefficients of the polynomials Q±{k,x) can be found in [CFKRS]. 
These values were then also used in graphing the difference (18). 

Tables 1-2 provide strong numerical support in favor of the asymptotic formula pre- 
dicted by CFKRS, described in equations (7) to (9), for both d < and d > 0, agreeing to 
7-8 decimal places for k = 1, and 4-5 decimal places for /c = 8. 

In the figures below we depict thousands of values of the quantities (17) and (18) at 
multiples of 10^ i.e. X = 10\ 2 x 10^ .... We display data up to X = 1.3 x 10^° for > 0, 
and X — 5x 10^° for d <0. The larger amount of data for d < reflects the faster method 
that we used for computing the corresponding L- values. 

In Figures 1 and 2, notice that each graph fiuctuates tightly about one, with the extent 
of fiuctuation becoming progressively larger as k increases, as indicated by the varying 
vertical scales. The graphs show excellent agreement with the full asymptotics as predicted 
by CFKRS across all eight moments computed, for both d < and d > 0. One does also 
notice a slight downward shift from 1 in the A; = 3 plots, as predicted by DGH and Zhang. 

We also depict in Figures 3 and 4 the differences (18) as dots, as well as the running 
average of the plotted differences as a solid curve. While we plot the average every lO'^, 
these running averages were computed by sampling the differences every 10^. We chose to 
display 1/lOth of the computed values in order to make our plots more readable given the 
limited resolution of computer displays and printers. 

Averaging has the effect of smoothing the moment and reducing the impact of the noisy 
remainder term. It allows one to more clearly see if there are any biases hiding within the 
noise. This running average gives a discrete approximation to the smoothed difference: 

A±(fc,X):= Yl L{l/2,Xd)'{l-\d\/X) r Q^{k,logt){l-t/X)dt, (19) 

deD±{x) ^ "'1 

We make several observations concerning Figures 3 and 4. First, for A; = 1, the observed 
remainder term is seemingly of size much smaller than Goldfeld and Hoffstein's 

proven bound of 0(X^^/^^"'"^) for the first moment, and also smaller than the bound of 
0(X^/^"'"^) implied in their work for a smoothed first moment (for the latter, see also 
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Young [Y]). Furthermore, there appears to be a bias in the average remainder term. This 
is especially apparent for d < and further supported by our log log plots in Figure 6. 

For k = 2, the remainder term, even when averaged, fluctuates above and below 0. The 
largest average remainder for /c = 2 in our data set was of size roughly 1.3 x 10^, consistent 
with a conjectured remainder, for A; = 2, of size 0{X^^^'^^). 

In Figures 3 and 4, a bias of the kind predicted by DGH can be seen. For k > 3 
a noticeable bias is evident in the remainder term, especially when averaged, and most 
prominently for d > 0. 

We elaborate on this last point further. In the case of A; = 3, DGH predict a single 
main lower term of the form bX^^^ and, as described in the introduction, Zhang worked 
out the value of 6, separately for d < and d > 0, as equal to —.07 and —.14 respectively. 

One possible explanation for the fact that the bias appears more prominently for d > 0, 
even though we have less data in that case, is that the 'noise' appears numerically to be 
much larger for o? < 0. Comparing Figure 3 with Figure 4, we notice that the plotted 
remainder terms seem to be about ten times larger for d < as compared to d > 0, even 
when restricted to \d\ < 1.3 x 10^°. A larger amount of 'noise' in the remainder term 
makes it harder to detect a lower order term hiding within the noise, especially when the 
the lower terms are married to such small constant factors: —.07X^^^ and —.lAX^^^, as 
predicted by Zhang, before averaging, and 4/7 as large after averaging over X. 

Thus, even though one expects, in the long run, to see an X^^^ term dominate over 
noise of size in the ranges examined it seems that the noise has a large impact. 

Another factor possibly affecting the poorer quality of the lower term detected when 
c? < is that the predicted constant factor is about one half as large: —.07 for d < 0, 
compared to —.14 for d > 0. Combined with noise that is ten times bigger, it is not 
surprising that the quality of the average remainder term for d < seems to be more 
affected by the noise. 

In Figure 5 we redisplay the k = 3 plots from Figures 3 and 4, zoomed in to allow one 
to see the average remainder term in greater detail. Here we also depict the prediction 
of Zhang (dashed line). More precisely, the dashed line represents the average predicted 
lower term: 

^ J\±t'/*dt = ^b^x'/\ (20) 

with 6_ = —.07 for (i < and = — .14 for d > 0. For d > 0, the fit of the average 
remainder term against Zhang's prediction is very nice. For d < 0, the plot supports a 
bias in the sense that the average value is mainly negative, but the fit against — .07|X^/^ 
is far from conclusive. 

Plots of the average reaminder term on a log log scale are shown in Figure 6, for 
1 < k = 4, and positive/negative d. On a log log scale, a function of X of the form 
/(X) = BX^^^ is transformed into the function of -u = logX given by log-B + 3/4-u, i.e. 
a straight line with slope 3/4. We compare, in the A; = 3 plots, the log log plot of the 
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average remainder against Zhang's prediction. The fit is especially nice for d > 0, but only 
in crude qualitative terms for a? < 0, consistent with the observations made concerning the 
poorer fit in the k = 3, d < 0, plots of Figures 3-5. 

For A; > 4, DGH predict infinitely many lower terms, with the largest one of size 
times a power of logX which they did not make explicit. Interestingly, a bias in support of 
this does appear evident, especially ior k — A and d > 0. In that log log plot, the remainder 
term does seem reasonably straight suggesting a lower order term obeying a power law, 
perhaps with some additional powers of log. 

It is reasonable to contest that the observed biases here exist due to a persistent small 
error in the calculation of the moment polynomials or in the values of L(l/2, Xd)- In an 
effort to alleviate such concerns, the computations yielding our numerics were executed 
again, in a limited way, using higher precision. As anticipated, these higher precision 
results remained consistent with the initial results, reducing the possibility of such a bias 
existing. Furthermore, the overall excellent agreement of the computed moments with the 
predicted asymptotic formula of CFKRS supports the correctness of the computation. 
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2.1 Tables and Figures 



k 


T.deD_iX) ^(1/2, Xd)' 




R-{k,x) 


A_(fc,X) 


1 


25458527125.376 


25458526443.085 


1.0000000268001 


682.291 


1 


52401254983.398 


52401252573.351 


1.0000000459922 


2410.047 


1 


79904180421.746 


79904180600.902 


.99999999775786 


-179.156 


1 


107770905413.09 


107770904521.07 


1.0000000082770 


892.02 


1 


135908144579.9 


135908144595.65 


.99999999988411 


-15.75 


2 


695798091128.96 


695797942880.62 


1.0000002130623 


148248.34 


2 


1505736931971.7 


1505736615082.0 


1.0000002104549 


316889.7 


2 


2362905062077.2 


2362905209666.9 


.99999993753888 


-147589.7 


2 


3251727763805.6 


3251727486319.2 


1.0000000853351 


277486.4 


2 


4164586513531.5 


4164586544704.8 


.99999999251467 


-31173.3 


3 


35923488939396. 


35923434720074. 


1.0000015093023 


54219322. 


3 


82792501873632. 


82792433101707. 


1.0000008306547 


68771925. 


3 


.13470723693602cl5 


.13470723096090el5 


1.0000000443563 


5975116 


3 


.19013982678941el5 


.19013979175101el5 


1.0000001842770 


35038394 


3 


.24831500039182el5 


.24831501538879el5 


.99999993960505 


-14996973 


4 


.26221677201508el6 


.26221542614856cl6 


1.0000051326749 


13458665240 


4 


.64846065425230el6 


.64845918799277el6 


1.0000022611439 


14662595290 


4 


.10987196470794el7 


.10987187884822el7 


1.0000007814531 


.8585972el0 


4 


.15956123181403cl7 


.15956125546013el7 


.99999985180550 


-.2364610el0 


4 


.21299535514803el7 


.21299540911015el7 


.99999974665125 


-.5396212el0 


5 


.23541937472178el8 


.23541622006477el8 


1.0000134003384 


.315465701el3 


5 


.62771726711464cl8 


.62771414322685cl8 


1.0000049766089 


.312388779el3 


5 


.11106890853615el9 


.11106862772711el9 


1.0000025282480 


.28080904el3 


5 


.16628632428499el9 


.16628683849741el9 


.99999690767817 


-.51421242el3 


5 


.22724025077610el9 


.22724048423231cl9 


.99999897264693 


-.23345621el3 


6 


.24225487162243e20 


.24224780818937e20 


1.0000291578822 


.706343306el5 


6 


.69880224640908e20 


.69879554487455e20 


1.0000095901220 


.670153453el5 


6 


.12937968210632e21 


.12937887586288c21 


1.0000062316467 


.80624344cl5 


6 


.19996752978479e21 


.19997013306315e21 


.99998698166411 


-.260327836el6 


6 


.28005925088677e21 


.28006019455853e21 


.99999663046810 


-.94367176el5 


7 


.274712571777e22 


.274697762672e22 


1.00005391054 


.14809105cl8 


7 


.859431066562e22 


.859415893116e22 


1.00001765553 


.15173446el8 


7 


.166743403869e23 


.166740957095e23 


1.00001467410 


.2446774el8 


7 


.266330275024e23 


.266339641978c23 


.999964830793 


-.9366954el8 


7 


.382588166641e23 


.382591322018e23 


.999991752617 


-.3155377el8 


8 


.3351697755e24 


.3351406841e24 


1.000086804 


.290914e20 


8 


.1139465805e25 


.1139429048e25 


1.000032259 


.36757e20 


8 


.2319359069e25 


.2319282301e25 


1.000033100 


.76768e20 


8 


.3831454627e25 


.3831738559e25 


.9999259000 


-.283932e21 


8 


.5649093016e25 


.5649183210e25 


.9999840342 


-.90194e20 



Table 1: Moments Y.deD_{x)L{l/'i,Xdf versus CFKRS' ^ Q_{k,logt)dt, for k = 
1, . . . , 8 and d < 0. Five values for each k are shown, at X = 10^°, 2 x 10^°, . . . , 5 x 10^°. 
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k 




O r. V 


i?+(A-.;r) 


A+ (/,-.. Y) 


1 


4074391863.4447 


4074392042.9388 


.99999995594580 


-179.4941 


1 


8445624718.0243 


8445624023.3138 


1.0000000822569 


694.7105 


1 


12928896894.590 


12928896383.146 


1.0000000395582 


511.444 


1 


17484928279.579 


17484927921.500 


1.0000000204793 


358.079 


1 


22095062063.114 


22095062690.738 


.99999997159438 


-627.624 


2 


76310075816.466 


76310057832.320 


1.0000002356720 


17984.146 


2 


168051689378.93 


168051603484.03 


1.0000005111222 


85894.90 


2 


266303938917.29 


266303916920.62 


1.0000000825999 


21996.67 


2 


368948427173.22 


368948308826.37 


1.0000003207681 


118346.85 


2 


474942139636.16 


474942177549.68 


.99999992017235 


-37913.52 


3 


2478393690176.2 


2478391641054.5 


1.0000008267950 


2049121.7 


3 


5878735240405.9 


5878729153410.4 


1.0000010354271 


6086995.5 


3 


9720154390088.4 


9720158187579.5 


.99999960931797 


-3797491.1 


3 


13873264940982. 


13873252832529. 


1.0000008727912 


12108453. 


3 


18271480140004. 


18271496263135. 


.99999911758015 


-16123131. 


4 


.10868425484737cl5 


.10868409751016cl5 


1.0000014476562 


157337203 


4 


.27974980520169el5 


.27974915668497el5 


1.0000023182079 


648516719 


4 


.48473276073219el5 


.48473329605692el5 


.99999889563038 


-535324726 


4 


.71493167429315cl5 


.71492961664246cl5 


1.0000028781164 


2057650687 


4 


.96564046289913el5 


.96564334647659el5 


.99999701382764 


-2883577466 


5 


.57022430562904el6 


.57022322406897el6 


1.0000018967310 


10815600670 


5 


.15999737676260el7 


.15999653478756el7 


1.0000052624580 


.84197504ell 


5 


.29130430291967el7 


.29130495012249el7 


.99999777826357 


-.64720282ell 


5 


.44482716417300el7 


.44482376920928el7 


1.0000076321545 


.339496372el2 


5 


.61707290890367el7 


.61707708869778el7 


.99999322646362 


-.417979411el2 


6 


.33658290814098el8 


.33658163201404el8 


1.0000037914337 


.127612694el3 


6 


.10326933113376el9 


.10326816848898el9 


1.0000112585010 


.116264478el4 


6 


.19792425806612cl9 


.19792515491256cl9 


.99999546875969 


-.89684644el3 


6 


.31332379844474el9 


.31331890401641el9 


1.0000156212353 


.489442833el4 


6 


.44685941512069el9 


.44686487402482el9 


.99998778399367 


-.545890413el4 


7 


.215991539086e20 


.215989246213e20 


1.00001061568 


.2292873el5 


7 


.726312167992e20 


.726295668031e20 


1.00002271797 


.16499961el6 


7 


.146733199900e21 


. 1467345331 14e21 


.999990914109 


-.1333214el6 


7 


.241042340834c21 


.241036160843e21 


1.00002563927 


.6179991el6 


7 


.353694078736e21 


.353700808054e21 


.999980974547 


-.6729318el6 


8 


.1475899774e22 


.1475859642e22 


1.000027192 


.40132el7 


8 


.5449090667e22 


.5448853612e22 


1.000043505 


.237055cl8 


8 


.1161602962e23 


.1161622793e23 


.9999829282 


-.19831el8 


8 


.1981618159e23 


.1981550523e23 


1.000034133 


.67636el8 


8 


.2993403001e23 


.2993484649e23 


.9999727248 


-.81648el8 



Table 2: Moments J2d&D+{x) -^(1/2, Xd)'^ versus Q+{k, logt)dt, ioi k — 1, . . . ,8 and 

d> 0. Five values for each k are shown, X ^2 x 10^, 4 x 10^, . . . , 10^°. 
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Figure 1: These plots depict tlie ratio R^{k,X) of the numerically computed moments 
compared to the CFKRS predictions, for A; = 1, . . . , 8 and d < 0, sampled every 10'', i.e. 
at X = 10^, 2 X 10'^, . . . , 5 X 10^°. The horizontal axis is X, the vertical axis is the ratio 
R-{k,X). 
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igure 2: These plots depict the ratio R^{k,X) of the numerically computed moments 
compared to the CFKRS predictions, for A; = 1, . . . , 8 and d > 0, sampled every 10'', i.e. 
X — 10^, 2 X 10^, . . . , 1.3 X 10^°. The horizontal axis is X, the vertical axis is the ratio 

4k,x). 
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Figure 3: These plots depict tiie difference A_{k,X) between tlie numerically computed 
moments and the CFKRS prediction, for A; = 1, . . . , 8 and d < 0, sampled at X = 10^, 2 x 
10^, . . . , 5 X 10^°. The horizontal axis is X, the vertical axis is the difference A_(A;, X), and 
the sohd curve is the mean up to X of the plotted differences (see the discussion above) . 
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Figure 4: These plots depict tlie difference A_|_(A;,X) between tlie numerically compute 
moments and the CFKRS prediction, for A; = 1, . . . , 8 and d > 0, sampled at X = 10^, 2 x 
10^, . . . , 1.3 X 10^°. The horizontal axis is X, the vertical axis is the difference A+(/c, X), 
and the sohd curve is the mean up to X of the plotted differences (see the discussion 
above) . 
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Figure 5: This graph depicts the remainder term for the third moment for d < (top) 
and (i > (bottom), i.e. A_(3,X) and A_|_(3,X). The solid line is the average remainder, 
and the dashed hne is Zhang's prediction. See the discussion above concerning the quahty 
of the fit. 
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Figure 6: Plots, on a log log scale, of the absolute value of the average remainder term 
depicted in Figures 3-4, for 1 < /c < 4, d > (top four plots), and d < (bottom four 
plots). For the 3rd moment we compare to Zhang's predictions of .14|a;^/^ (3rd plot) and 
.07|x^/^ (7th plot). For the 1st moment, d < 0, there seems to be a bias of size roughly 
xV4. 



3 Our Computational Formulae 



The computations for the moments of L{l/2,Xd) hinge on the efficient computation of 
L(l/2, Xd) itself for many discriminants d. This computation is split into two cases accord- 
ing to whether d is positive or negative. In the former case we calculate L{l/2,Xd) using 
a smooth approximate functional equation for L{s,Xd)i which is representable in terms of 
the incomplete gamma function. In the latter case, we consider the Dedekind zeta function 
for the associated quadratic field, and reduce the computation of L{l/2, Xd) to a sum over 
binary quadratic forms, and the X-Bessel expansions of their Epstein zeta functions as 
determined by Chowla and Selberg [CS]. 

Testing the conjectures described in the introduction also involves numerical values for 
the coefficients of the polynomials Q±{k,x). We reran the program used in [CFKRS] on a 
faster machine and for a longer amount of time to get slightly more accurate coefficients 
for these polynomials. 

3.1 Computational Formula for L(l/2,Xd)5 d <0 
Let 

Cq(v^)(s) = C(s) L{s,Xd) 

be the Dedekind zeta function of the quadratic number field Q(\/D), and h{d) the corre- 
sponding class number. 

Let ajin^ + bj-mn + Cjn^, j = 1, . . . , h{d), be representatives for the h{d) equivalence 
classes of primitive positive definite binary quadratic forms of discriminant — AajCj — 
d<0 [L]. 

Dirichlet proved (see also [D], Chapter 6) that: 

^ hid) / X -s 

Cq(v:d) (*) = - XI XI ' y'j'^^ + ^^"^^ + ^j'^^ ) ' ^s>\ (21) 

where 

{2 d < -4, 
4 ci = -4, 
6 (i=-3, 

and where denotes the sums over all pairs (m, n) e Z^, (m,n) ^ (0,0). 

Chowla and Selberg [CS] obtained the meromorphic continuation of the Epstein zeta 
function 
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with d — b'^ — Aac < 0, a, c > 0, by giving an expansion for Z{s) as a series of X-Bessel 
functions. Specifically, they proved that 

Z{s) = 2C(2s)a-^ + ^ ^^-2^ ({28 - l)T{s - 1/2) + B{s) (22) 



and 



m{\d\yy2 

where 

aV2r(s) \d\— ^ V « / V « / 

'^c.(n) = 5^m'^, (24) 

ra\n 

K^{z) = \^^ exp(^-|(l/ + l/l/)^l/--i%, 3fJ^>0. 

ii's-i/2(2^) decreases exponentially fast as a; — )■ cxo, uniformly for s in compact sets, and 
the above expansion gives Z{s) as an analytic function throughout C except for a simple 
pole at s = 1. Note that the poles at s = 1/2 in (22) of the terms with C{2s) and r(s — 1/2) 
cancel out. 

Speciahzing the above formula to s = 1/2 gives 
Z(l/2) = 4^7 + 10/'"^'' 



02 



Stto 



o , , i'7rnb\ /Trnldh 1 
+ — Z^^o(n)cos \Ko[ ^ . (25) 



n>l 



Substituting this into (21), for a given a set of representative quadratic forms, one for each 
equivalence class, yields a formula that can be used to numerically compute L{l/2,Xd)- 
An explicit bound on Kq{x) can be obtained as follows. 



K,{x)^ exp(^-|(t/+l/t/))^ 



(26) 



so that 



\Ko{x)\ < J ^e-\ (27) 



The last inequahty can be seen by writing y + l/y — (y^/"^ — y^^^'^y + 2, changing variables 
u = x^/^(y^/^ - y"^/^), so that dy/y = x~^/'^2du/{y^/^ + y~^/^), and using, from the AGM 
inequality, y^^"^ + y~^^'^ > 2. 
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Next, we discuss implementation issues and complexity arising from this formula. 

Lagrange proved that each of the equivalence classes of primitive positive definite forms 
of discriminant d contains exactly one form ax^ + hxy + cy"^ for which —a < b < a < c or 
< b < a = c. Roughly, this is the set < |6| < a < c, with some exceptions. Furthermore, 
a < (|(i|/3)^/^. Recall that in this context primitive means that gcd(a,6, c) = 1. 

Therefore, let A{X) denote the set of triples: 

A{X) := {(a, b, c) e Z^\4ac -b'^ <X, -a <b<a<c or 0<b<a = c} (28) 
and A'{X) the set of primitive triples in A{X): 

A'{X) := {{a,b,c) G A{X)\gcd{a,b,c) = 1}. (29) 

Our first step was to distribute the computation across several processors, each one 
handling a range of discriminants, in order to speed up the computation and also to reduce 
the memory requirements per processor. Therefore, suppose < —d < X for some positive 
integer X, and let AX be a positive integer dividing X. We partitioned the interval into 
blocks, Xi_i < \d \ < Xi, of equal length AX — Xi—Xi_i for i = 1, 2, . . . , m (so X^^ :— X). 

We then looped through all integers (a, 6, c) e A'{X), with corresponding \d\ lying in 
(Xj_i,Xj], i.e. satisfying the following properties: 

fx'i ,,, b'^ + Xi^i b'^ + Xi 

0<a<\—, 0<b<a<c, ' <c< \ 30 

(taking care to throw away the terms above at the endpoints that are not in A'{X), for 
example, terms with —b — a). 

We computed d — b^ — Aac and updated L{l/2,Xd), stored in an array, using (25) to 
calculate the corresponding contribution to (21) from the triple (a, 6, c). 

Combining a < (|o?|/3)^/^ with the exponential decay of Ko{x) in (27) shows that few 
terms are needed to compute (25) to given precision. For example, the terms in (25) with 
n>7 contribute, in absolute value, less than 10~^^ to the sum, and can be ignored. Smaller 
a require even fewer terms. 

Because we are evaluating Ko{x) for a limited range of values and to machine double 
precision (15-16 digits), we used a precomputed table of the first five terms of several 
thousand Taylor series expansions: 

Ko{x) « Ko{xj) + 4'\xj){x -xj) + ... {K^'\xj)/A\){x - xj)\ (31) 

each one centred on a point xj of the form xj = j/200 with 5 < Xj < 37, j e Z. The 
interval [5, 37] was used, because, on one end, 7r3^/^ > 5, the Ihs being a lower bound for 
the smallest possible x for which we would need to evaluate Ko(x). We chose 37 for the 
other end of the interval because exp(— 37) < 10~^^, i.e. smaller than our desired precision. 
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While our code was written in C++, the precomputation of these Taylor expansions were 
carried out in Maple, with the coefficients stored in a ffie that could be read into our C++ 
program. Our C++ code was compiled with the GNU compiler GCC. 

Only five terms were computed and stored because our Taylor expansions were always 
apphed with \x — Xj\ < 1/400. Combined with the exponential decay of the Taylor coeffi- 
cients (as a function of x), at most 5 terms, and often fewer, were needed to evaluate the 
sum to within 10~^^. 

We also make note of a few of the hacks that helped to increase the speed of our 
program: 

• For a given a,b, only one cosine needs to be computed. Indeed, given cos(7r6/a), we 
can compute cos{7mb/a), for n = 1,2, ... ,7, using standard trigonometric identities. 
For instance, the double angle identity computes the expression for n = 2. 

• To test for primitivity, we must check whether gcd(a, 6, c) = 1. If one computes 
gcd(a,6) outside the c loop as previously mentioned, then for a given gcd(a,6), we 
can use 

gcd(a, 6, c) = gcd(gcd(a, 6), c mod gcd(a,6)), 

and thus compute, and then store within the c loop, at most one gcd per residue 
class mod gcd(a, b). 

• When reading an array, the computer loads blocks of consecutive bytes of the array 
from RAM into the CPU's cache where it can be accessed quickly by the CPU. On 
profiling, we found, after optimizing and streamlining the bulk of our code, that 
a significant amount of time was being spent accessing the array in which we were 
storing L(l/2, Xd) so as to increment it by the contribution from a given triple (a, b, c). 
The reason was that, as the inner c loop increments by 1, the value oi d = b^ — Aac 
changes by 4a, i.e. often a large step. Non-sequential array accesses are expensive 
timewise. We were able to significantly decrease the time spent on array accesses by 
anticipating the subsequent rf's, and using GCC's '__builtin_prefetch' function to fetch 
the corresponding array entry for L{l/2,Xd) eight turns in advance - the eight was 
determined experimentally on the hardware that we used and for the range of ci's 
that we considered. 

• The computation of (25) involves log(|(i|). Therefore, we precomputed these and 
stored them in an array, but were again faced with the same kind of expensive memory 
accesses as for the values of L{l/2,Xd)- Rather than prefetch these separately, we 
created a C++ struct to hold both 1/(1/2, Xd) and log(|d|) together. That way a single 
prefetch would load both at once. 

Remark. On combining the last two hacks, the array access portion of our code 
sped up by a factor of 4, and the overall running time of the program sped up by 
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a factor of 2. These two hacks were the last two implemented, and the speed up 
achieved indicates how expensive non-sequential memory accesses can be, and how 
optimized the rest of our code was. 

• To avoid repeatedly checking whether the quantity d = IP' — Aac is a fundamental 

discriminant, we precomputed, for each block < \d\ < Xi whether d is a fun- 

damental discriminant and stored that information in an array of boolean variables. 
We essentially sieved for squarefree numbers and this can be done, for each block of 
length AX, in 0{AX) steps, because the sum of the reciprocal of the squares con- 
verges. Hence, doing so across all blocks up to X costs 0{X) arithmetic operations 
and array accesses. No prefetching was used on this array as it did not seem to give 
a benefit, perhaps because the array constists of single bit boolean variables, rather 
than 64 bit doubles of the L- value and log(|(i|) arrays, and fits more easily within 
cache. 

• Since cos(a;) is an even function and b gets squared in the discriminant equation 
\d \ = Aac — 6^, we can group ±6 together, when possible, and restrict our attention 
to non-negative h values. Only a relatively small subset of triples cannot be paired 
in this fashion, namely when a — h — or c — a. 

• Terms such as 

2 

— (7-log(87ra)) 

appearing in the leading term of (25), depend solely on a. As such, it is to our 
advantage to compute this, and all other terms depending solely on a, outside the h 
and c loops. Similarly, we compute expressions like gcd(a,6) outside the c loop, and 
so on. While this is standard, we took it to a meticulous extreme to save on as many 
arithmetic operations as possible. 



3.2 Complexity for c? < 

First observe that the number of candidate triples (a, 6, c) defined by (28) that we must 
loop over, satisfies 

\A{X)\ - ^X'". (32) 

See [K] for a discussion on this counting problem. Furthermore, the number of triples that 
survive the condition gcd(a, 6, c) = 1 is: 

l^''^>l ~ W(zf''- '''' 
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The latter asymptotic formula was essentially stated by Gauss in his Disquitiones in con- 
nection to the sum of class numbers A'{X) = X]-x<d<o ^(^)- ^ proof, with a lower term 
and a bound on the remainder, as well as further references can be found in [CI]. 

We can get a lower bound for the amount of computation required by simply counting 
the number of triples (a, b, c) that are considered. Furthermore, because we are pairing 
together ±6, the number of triples that survive the gcd condition is roughly half of (33), 

Jtt^'^' (34) 

36C(3) ^ ^ 

The relatively small constant of 7r/36C(3) helps to makes this approach very practical. 

While the above asymptotic gives a lower bound on the number of operations of our 
method in computing all L(l/2, Xd), for < —d < X, it ignores the amount of work needed 
for each triple. Most of the work involves: checking bounds on each loop, testing whether 
gcd(a, b,c) = 1 and whether d = b'^ — Aac is a fundamental discriminant, and carrying out 
simple arithmetic and array accesses related to the evaluation of (25). 

Each arithmetic operation can be done in polynomial time in the size of the numbers 
involved (in fact, log(X)^"'"^ by using the FFT). However, in the range of discriminants we 
considered {\d\ < 5 x 10^°), the bit length is quite small: 32 bit C++ ints for a, 6, c sufficed 
and 64 bit long longs were used for discriminants. We also used 64 bit machine doubles 
for the floating point arithmetic. Therefore all arithmetic was carried out in hardware. 

Recall that we are assuming that < — d < X, and partitioning this interval as: 

1, . . . , AX, AX + 1, . ..,2AX , . . ., (;m- 1)AX + l,...,mAX , . . . , 

Block 1 Block 2 Block m 

where AX is a positive integer assumed to divide X (in practice one can take X and AX 
to be powers of, say, ten). The number of blocks equals X/AX, where, for reasons made 
clear below, we will eventually take AX » X^/^ log(X)^. 

As mentioned earlier, we precomputed, via sieving, a table of fundamental discriminants 
for each block of length AX using 0(AX) arithmetic operations and array accesses, hence 
0(X) operations across all blocks, i.e a relatively small cost compared to 0(X^/^). 

The number of terms needed in the expansion of Bessel function is proportionate 
to the number of digits of precision desired, and the number of Taylor coefficients used in 
each Taylor series for Kq{x) also depends on the ouput precision. We worked with machine 
doubles and hence about 15-16 digits precision, and, as explained in the text near (31), we 
used at most five terms in each Taylor series. 

Next we consider the time used to carry out the gcd computations. We described in the 
hacks listed above that, for each triple (a, 6, c), we computed gcd(a, 6, c) by first computing 
gcd(a, b) outside the c-loop and then computing gcd(gcd(a, 6), c mod gcd(a, b)) inside the 
c-loop, with the latter calculation being performed at most once per residue class modulo 
gcd(a, 6), and then stored in the c-loop for subsequent use. 
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The number of gcd(a, 6)'s that are computed across all blocks is 



« ^ E E E 1 « ^VAX (35) 

m< ^ ..' /mAX 0<b<a 
" — AX a< t / — = — — — 
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Next, we compute gcd{a,b,c) = gcd(gcd(a, 6), c mod gcd(a,fe)), storing these values, 
inside the a, b loops, for each residue class mod gcd(a, b) so as to only compute one gcd 
per residue class. Therefore, the number of additional gcd's required is 

« E E E g^d(a,6). (36) 



It is known that 

EEg-<i(a,6)~^^i|f; (37) 

a<xO<b<a ' 

see, for example, [Bo] or [Br]. So, from (36) and the above asymptotic, we see that the 
number of additional gcd's computations required across all blocks is 

< E mAXlog(mAX) < ^'^y^ - (38) 



Therefore, combining with (35), the total number of gcd calls is 

X'^logX 



Furthermore each gcd(a, b) can be computed, using the Euchdean algorithm, in 0(log(X)^) 
bit operations, since the binary length of both a and h is O(logX), and so the total number 
of bit operations coming from gcd's is <^ X'^\og{XY / AX . 

Hence, we can make the overall time required for gcd evaluations an insignificant portion 
of the overall time by choosing 

AX > J5^^/Mog(x)^ (40) 

as the number of bit operations for the remaining work (looping through a, 6, c and m, and 
carrying out the required integer and floating point arithmetic) is then 

0(^3/2 log(X)l+^). 

The accounts for the overall number of triples (a, fo, c) considered, and the log(X)^+^ 
for the cost of arithmetic on numbers of bit length 0{\ogX). The implied constant depends 
on the number digits of precision desired for the L- values. 
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While the best choice might seem to be to take AX equal to X so as to minimize the 
number of gcd calls, this would come at a substantial price. First, such a large AX would 
prevent us from simply distributing the computation across several processors, each one 
handling one block at a time. 

Second, the memory (RAM) requirements needed would be enormous. There is also an 
advantage to having arrays that can fit entirely or significantly within the CPU's cache, 
so as to avoid too many expensive memory fetches from RAM, and, even with smaller 
AX, there is a tradeoff between minimizing calls to the Euclidean algorithm and memory 
accesses. 

We determined a good choice of AX experimentally, since, in practice, the big-Oh 
constants in the above estimates depend on the speed of individual arithmetic and memory 
operations on given hardware and context in which they are called. 

Nonetheless, since the Euclidean algorithm is very simple, and the remaining work 
associated with looping, computing the i^T-Bessel function, and updating L- values involves 
a moderate number of arithmetic and memory operations, one expects that the benefit 
should be felt sooner rather than later. Indeed, we found that, in our range of d's, a choice 
that eliminated the gcd's as a bottleneck, while not paying too high of a cache size penalty, 
was AX — 10^, i.e. blocksizes of one million. 



3.3 Computational Formula for L(l/2, Xd), d> 

The proof of de la Vallee Poussin of the functional equation for L{s, Xd) imitates that of 
Riemann for his zeta function. It yields the analytic continuation of L{s, Xd) and also the 
following formula, an example of a 'smoothed approximate functional equation', useful for 
its evaluation: 

oo 

(d/nr/'r(s/2)L{s, Xd) = Yl W (^(^/2' ^^7^) + ^((1 - s)/2, nnyd)) (41) 

n=l 

where G{z, w) denotes the normalized incomplete gamma function 

/»oo /»oo 

G{z,w):= x'-^e-'^'^dx ^ x'-^e'^'dx ^ w-'r{z,w), > 0, (42) 

Jl Jw 

with r{z,w) the incomplete gamma function. See for instance page 69 of [D], or Section 
3.4.1 of [R], and use Gauss' formula for the Gauss sum, namely T{xd) = d^^"^, when li > 0. 

Therefore, on specializing to s = 1/2, we have a smooth approximate functional equa- 
tion for L{l/2,Xd), namely 

L{l/2,Xd)-2[-) X^Xdin) ^^^^^^ -^■^ ^ r(l/4) ' ^^^^ 



n>l ^ ' ' n>l 
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valid for positive fundamental discriminants d. 

To estimate the size of the terms being summed, first notice, from the definition, that 
G{l/4,w) > for real w. Furthermore, integrating by parts, gives an upper bound: 



oo 



G(l/4,u>) = - e-'"'^x-'''Hx < — (44) 

ty 4w w 

This inequality tells us that the terms in (43) decrease exponentially fast in the quantity 
'KT? jd, SO that, roughly speaking, we need to truncate the sum when n is of size d)-!"^ to 
achieve a small tail. 

Let us consider this estimate more carefully. Set 

2 r(i/4,f%/<i) 
= Vt r(i/4) ■ 

In light of bound (44), we have 

< f(i7i) — ^''^ 

Let the number of working digits be labelled as 'Digits'. Hence, for 



n > y ^ log(lO) ■ Digits (47) 

we generously have 

f{n) < 10-°'e"^ (48) 
Furthermore, notice that the terms start off, for smaller n and large d, with 

r{l/ A, n'^n/d) 



r(l/4) 



1. 



Therefore, it does not make sense to sum the terms beyond (47), as those terms are lost 
to numerical imprecision. 

We must thus see to what extent the ignored tail end of the sum can contribute to the 
value of L{l/2,Xd)- Summation by parts yields 

/N 
Y,Xd{n)f{t)dt. (49) 
n^iv n^jv - n<t 

So on letting iV ^ oo, we obtain 

/•oo 

L(l/2,xd) = -/ 5]xdH/'(i)rft. (50) 

•^1 n<t 
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Moreover, by subtracting (49) from (50), we get a formula for the tail: 



Xd{n)f{n) = -/(TV) J] Xd{n) - / Yl Xd{n)f{t)dt. (51) 



n=N+l n<N " " n<t 



One could use the inequality of Polya- Vinogradov, Burgess, or even the trivial bound 
IXci(^) I ^ 1) here to get a reasonable, but not optimal, estimate for the size of the tail. 
However, something closer to the truth is obtained by using the conjectured bound 

Y,X<i{n) = 0{x'l^d^). (52) 

n<x 

Combined with (46) this gives 

f{N)Y.xAn)=0(^^e--^'A, (53) 



n<N 

and, similarly, 

Xd[n)r[t)dt<^d- 1^ t-^r{t)dt<^ 

n<t 



^ Y.Xci{n)f{t)dt^d^ t-.f{t)dt<^^e-'''/', (54) 



where we have applied integration by parts to get the last bound. Applying these bounds 
to (51) and choosing 

TV = log(lO) • Digits, (55) 
gives the following bound for (51): 

V Digits^/ V 

We therefore conclude that the tail isn't much bigger than an individual term, and, in 
principle, we could compensate for the extra o?^ by taking Digits slightly larger than our 
desired output precision, say by an amount equal to elogd/ log 10. 

We remark that, by using the trivial estimate or Polya- Vinogradov inequality, we could 
get rigorous estimates with explicit constants, but larger by a factor of roughly o?^/^ as 
compared to (56). 
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3.4 Cancellation and accuracy 

We can also use the above analysis to show that our approach to computing 1/(1/2, Xd) using 
the smooth approximate functional equation is well balanced, i.e. that little cancellation 
and hence loss of precision takes place in summing (43). We consider the maximum size 
that the partial sums can attain so as to give us a sense of how many digits accuracy after 
the decimal place are attained when working with Digits decimal places. 

Consider the partial sums (49) (for a general A^', not just our specific choice of N), 
apply the conjectured bound (52), and integrate by parts: 

N' r^' 

Yl Xd{n)f{n) « f{N')N'^''d^ + dH'/^f{t) ^ + r''^f{t)dt. (57) 



n<N' 



Again, we can get a proven, though weaker, upper bound with explicit constants if we use 
a proven bound rather than the conjecture (52). 

Next, notice that r(l/4, x) < r(l/4), because the definition of the Ihs here involves 
integrating over a smaller portion of the positive real axis as compared to the rhs. Thus, 
from (45), 

f{t) < 2rV2. (58) 

Applying this to (57) gives 

Y,Xd{n)f{n)<^d'\ogN'. (59) 

n<N' 

Therefore, because we take the partial sums with N' < N — 0((d Digits) ^/^), we have, on 
adjusting e to incorporate the log d: 

Yl Xd{n)f{n) < d' log Digits. (60) 

n<N' 

Therefore, the partial sums do not get large and we thus have nearly as many digits 
accuracy beyond the decimal place as our working precision. 

Using a similar analysis, the effect of accumulated round off error can be estimated by 
replacing Xd{n) with random plus and minus ones multiplied by a factor of size lO^^'^its 
model the random rounding up or down of the terms in the sum. With high probability, 
we then get an error, due to accumulated round off of size 

(d^ log Digits) 10-°'s^*^ (61) 

If one desires rigorous, rather than experimental, values of L(l/2, Xd), an interval arith- 
metic package should be used in practice. Because our goal was to test conjectures rather 
than prove a rigorous numerical result, we were satisfied with an intuitive understanding of 
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the accuracy of our computation and carried out several checks of the values attained, for 
example comparing a similar smooth approximate functional equation for the case d < 
against select values attained by our implementation using the Epstein zeta function, and 
also using a high precision version of Rubinstein's Icalc package to test a few several values. 

3.4.1 Hacks 

We list a few hacks which were helpful in the implementation of the smooth approximate 
functional equation (43). 

• Xdin) can be efficiently computed by repeatedly extracting powers of 2 and applying 
quadratic reciprocity. 

• As in the case for < 0, it is to our advantage to partition < d < X into blocks 
and farm the work out to many processors. 

• Due to the presence of Xd{n) in the (43), it is more efficient to place the rf-loop on 
the inside and the n-loop on the outside because Xd{n) is periodic in d with period 
either n or 8n, depending on whether the power of two dividing n is even or odd. 
Furthermore, n is comparatively small compared to d, by (55). Thus, for each n we 
precomputed a table of Xdin), so as to only compute this values once per residue 
class d mod n or 8n. This pays off so long as each residue class gets hit, on average, 
more than once (perhaps slightly more because of the overhead involved in storing 
the values and looking up the array.) In our implementation, with blocks of length 
10^,0<d<1.3xl0^°, and 16 digits working precision, it was conducive to do so. 

• We computed the normalized incomplete gamma function G{z,w), evaluated ai z = 
1/4 and w — n^Tr/d, as follows. For w > 37, return (since exp(— 37) < 10~^^). For 
1 < w < 37, use a precomputed table of Taylor series, centering each Taylor series at 
multiples of .01 (so nearly 4000 Taylor series) and taking terms up to degree 7 (less 
for larger w because of the exponential decay). Otherwise, for w < 1, employ the 
complimentary incomplete gamma function 





Specifically, set 




so G{z, w) = w ^T{z) — g{z, w), and integrate by parts to get 




oo 
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where 

'^^'=|l iO = 0. 

We stored the value of r(l/4) and calculated the above series for g{l/A,w) by trun- 
cating the sum when the tail was less than 10~^^. 



3.5 Complexity for d > 

Recall that, as for the case of negative discriminants, we are partitioning the interval 
< d < X into blocks of length AX. 

The overall cost for sieving for fundamental discriminants, summed over all blocks, is a 
meager 0{X) arithmetic operations and array accesses on numbers of bit length 0{\ogX), 
as for the case of d < 0. 

Next wc estimate the overall time, summed over blocks, required to create a precom- 
puted table of characters Xd{i^) for all residue classes mod n or 8n. 

Summing over blocks m, and taking the maximum truncation point (55) that occurs 
for a given block, the time required is 

«log(X)^ E (62) 

where 



AX - 



M = y ^^^^ log(lO) • Digits. (63) 

Here we have used the fact that each character (^) can be calculated in time 0(size(d)size(n)), 
where size means binary length (see, for example, [C]), and that both d and n are of size 
0{logX) in this case. Summing, the time needed here is therefore 

^X 

So, by choosing AX ^ X^/^^*^, we can make the overall time spent on computing the 
Kronecker symbol o(X'^/^). As AX increases, there is a tradeoff between spending less 
time on the character computation and having larger arrays, similar to our computation 
of gcd's in the d < case. There is a definite advantage, depending on the particular 
hardware, to having smaller arrays, i.e. smaller AX, to reduce the number of calls to move 
data from RAM into cache. On our hardware, and in our range < c? < 1.3 x lO^'^, we 
found that a value of AX = 10® worked well. 

Thus, the bulk of the work is spent on looping, for each block, through n and d, looking 
up the precomputed character values, computing the normalized incomplete gamma func- 
tion (7(1/4, rnn? /d) to given precision, and updating the corresponding value of L{l/2, Xd) 
by the amount X(i(ri)/(n). 
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The kind of work and operations required is thus very similar to our approach for 
the d < case, with the handhng of characters similar to our handling of gcd's, and the 
approach to computing the incomplete gamma function similar to that of the i^-Bessel 
function. 

However, there is one significant difference in the two methods. For d < 0, equation (34) 
tells us that our Epstein zeta function method loops through gg^X^/^ —Ki 0.0726X^/^ 
triples a, b, c. Not only is the constant .0726 small, but the desired precision does not affect 
the number of triples required. Precision becomes a factor only in regards to computing 
the particular contribution from each triple, for example the number of terms needed for 
the various i^-Bessel Taylor series expansions. 

But, in the present case of the smooth approximate functional equation, both the length 
of the sum and the amount of work needed to compute the individual terms of the sum 
depends on the desired precision. So, the main difference in these two approaches is the 
length of the sum. 

In the case of d > 0, the length of the main d, n loops, summed over all blocks of length 
A.X, is quantified by 

Vs= EE E 1' (65) 

m<-^ ■n<M {m-l)AX<d<mAX 

with M given by (63). Simplifying the two inner sums, this quantity is easily estimated to 
asymptotically be 

3 

So, if Digits = 16, then Lpos ~ 2.28^2, which is more than twenty times larger than 
the number of triples, 0.0726X^/^, considered for d < 0. 

It is impossible to precisely pin down, theoretically, the constant factor savings in the 
runtime of our method for d < compared to the approach used for c? > as it depends 
on the speed of the various arithmetic and memory operations on particular hardware. 
Furthermore, these are not easily quantifiable as they change according to how the various 
resources of the machine are being used at a given moment. Another obstacle to a precise 
comparison is that one would need to take into account implementation choices made by 
the programmer and also by the compiler at the minutest of levels. 

Nonetheless, the rough comparison between the lengths of the main loops involved, i.e. 
2.28X1 for d > and 0.0726^^/^ for d < (see equation (34)), does refiect the different 
runtimes when compared experimentally. 

We ran our computation for d < on mod.math.washington.edu, which is a Sun Fire 
X4450, dating from 2008, with 24 Intel Xeon X7640 2.66 GHz CPUs (we used 12 of them), 
and 128 GB RAM. Our computation for d > was carried out on pilatus .uwaterloo . ca 
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which is an older SGI Altix 3700 machine, dating from around 2003, with 64 Intel Madison 
Itanium CPUs (we used 55 of these) running at 1.3 GHz, and 192 GB of RAM. 

Our computation, for d > 0, took roughly 18.9 CPU years, and about 3.9 CPU years 
for d < 0. Recall that we went up to < —d < 5 x 10^°, whereas for d > 0, we managed to 
get to 1.3 X 10^°. So not only did our computation for d < take much less CPU time, but 
we went significantly further. To make this more meaningful, we should compare intervals 
of similar length, i.e. the subset of < —d < 1.3 x 10^°. This interval required .4 CPU 
years, i.e. about 47 times faster than our computation for the interval < < 1.3 x 10^°. 
However, because different machines were used for d < and d > 0, we should compensate 
by dividing the time used for d > by a factor of 2.5 to account for the fact that these d 
were handled on an older and slower machine. The value of 2.5 was decided by rerunning 
select blocks of d > on both machines, using the same C++ code, and comparing their 
runtimes, which were about 2-3 times faster on the newer machine. Therefore, dividing 47 
by 2.5, our code ran around 20 times faster for d < than it did for d > 0, consistent with 
our rough expectations based on the lengths in both methods of the main loops. 
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